智能论文笔记

Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation

Alessandro Castelnovo , Riccardo Crupi , Nicole Inverardi , Daniele Regoli , Andrea Cosentini

分类： (统计)机器学习 | 人工智能 | 机器学习

2022-09-13

机器学习应用在我们的社会中变得越来越普遍。由于这些决策系统依赖于数据驱动的学习，因此风险是它们会系统地传播嵌入数据中的偏见。在本文中，我们建议通过引入一个框架来生成具有特定类型偏差及其组合的综合数据的框架来分析偏见。我们深入研究了这些偏见的性质，讨论了它们与道德和正义框架的关系。最后，我们利用我们提出的合成数据生成器在不同的情况下进行不同的偏置组合进行实验。因此，我们分析了偏见对未经降低和缓解机器学习模型的性能和公平度量的影响。

translated by 谷歌翻译

Counterfactual Explanations as Interventions in Latent Space

Riccardo Crupi , Alessandro Castelnovo , Daniele Regoli , Beatriz San Miguel Gonzalez

分类：人工智能 | 机器学习 | (统计)机器学习

2021-06-14

可解释的人工智能（XAI）是一系列技术，可以理解人工智能（AI）系统的技术和非技术方面。 Xai至关重要，帮助满足\ emph {可信赖}人工智能的日益重要的需求，其特点是人类自主，防止危害，透明，问责制等的基本特征，反事实解释旨在提供最终用户需要更改的一组特征（及其对应的值）以实现所需的结果。目前的方法很少考虑到实现建议解释所需的行动的可行性，特别是他们缺乏考虑这些行为的因果影响。在本文中，我们将反事实解释作为潜在空间（CEILS）的干预措施，一种方法来生成由数据从数据设计潜在的因果关系捕获的反事实解释，并且同时提供可行的建议，以便到达所提出的配置文件。此外，我们的方法具有以下优点，即它可以设置在现有的反事实发生器算法之上，从而最小化施加额外的因果约束的复杂性。我们展示了我们使用合成和实际数据集的一组不同实验的方法的有效性（包括金融领域的专有数据集）。

translated by 谷歌翻译

The Zoo of Fairness metrics in Machine Learning

Alessandro Castelnovo , Riccardo Crupi , Greta Greco , Daniele Regoli

分类：机器学习 | (统计)机器学习

2021-06-01

近年来，解决机器学习公平性（ML）和自动决策的问题引起了处理人工智能的科学社区的大量关注。已经提出了ML中的公平定义的一种不同的定义，认为不同概念是影响人口中个人的“公平决定”的不同概念。这些概念之间的精确差异，含义和“正交性”尚未在文献中完全分析。在这项工作中，我们试图在这个解释中汲取一些订单。

translated by 谷歌翻译

Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models

Tommaso Mario Buonocore , Claudio Crema , Alberto Redolfi , Riccardo Bellazzi , Enea Parimbelli

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-20

In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.

translated by 谷歌翻译

Quantum Clustering with k-Means: a Hybrid Approach

Alessandro Poggiali , Alessandro Berti , Anna Bernasconi , Gianna Del Corso , Riccardo Guidotti

分类：机器学习

2022-12-13

Quantum computing is a promising paradigm based on quantum theory for performing fast computations. Quantum algorithms are expected to surpass their classical counterparts in terms of computational complexity for certain tasks, including machine learning. In this paper, we design, implement, and evaluate three hybrid quantum k-Means algorithms, exploiting different degree of parallelism. Indeed, each algorithm incrementally leverages quantum parallelism to reduce the complexity of the cluster assignment step up to a constant cost. In particular, we exploit quantum phenomena to speed up the computation of distances. The core idea is that the computation of distances between records and centroids can be executed simultaneously, thus saving time, especially for big datasets. We show that our hybrid quantum k-Means algorithms can be more efficient than the classical version, still obtaining comparable clustering results.

translated by 谷歌翻译

Multimodal and Explainable Internet Meme Classification

Abhinav Kumar Thakur , Filip Ilievski , Hông-Ân Sandlin , Alain Mermoud , Zhivar Sourati , Luca Luceri , Riccardo Tommasini

分类：人工智能 | 自然语言处理 | 机器学习

2022-12-11

Warning: this paper contains content that may be offensive or upsetting. In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.

translated by 谷歌翻译

CALIME: Causality-Aware Local Interpretable Model-Agnostic Explanations

Martina Cinquini , Riccardo Guidotti

分类：人工智能 | 机器学习

2022-12-10

A significant drawback of eXplainable Artificial Intelligence (XAI) approaches is the assumption of feature independence. This paper focuses on integrating causal knowledge in XAI methods to increase trust and help users assess explanations' quality. We propose a novel extension to a widely used local and model-agnostic explainer that explicitly encodes causal relationships in the data generated around the input instance to explain. Extensive experiments show that our method achieves superior performance comparing the initial one for both the fidelity in mimicking the black-box and the stability of the explanations.

translated by 谷歌翻译

LSVL: Large-scale season-invariant visual localization for UAVs

Jouko Kinnari , Riccardo Renzulli , Francesco Verdoja , Ville Kyrki

分类：机器人

2022-12-07

Localization of autonomous unmanned aerial vehicles (UAVs) relies heavily on Global Navigation Satellite Systems (GNSS), which are susceptible to interference. Especially in security applications, robust localization algorithms independent of GNSS are needed to provide dependable operations of autonomous UAVs also in interfered conditions. Typical non-GNSS visual localization approaches rely on known starting pose, work only on a small-sized map, or require known flight paths before a mission starts. We consider the problem of localization with no information on initial pose or planned flight path. We propose a solution for global visual localization on a map at scale up to 100 km2, based on matching orthoprojected UAV images to satellite imagery using learned season-invariant descriptors. We show that the method is able to determine heading, latitude and longitude of the UAV at 12.6-18.7 m lateral translation error in as few as 23.2-44.4 updates from an uninformed initialization, also in situations of significant seasonal appearance difference (winter-summer) between the UAV image and the map. We evaluate the characteristics of multiple neural network architectures for generating the descriptors, and likelihood estimation methods that are able to provide fast convergence and low localization error. We also evaluate the operation of the algorithm using real UAV data and evaluate running time on a real-time embedded platform. We believe this is the first work that is able to recover the pose of an UAV at this scale and rate of convergence, while allowing significant seasonal difference between camera observations and map.

translated by 谷歌翻译

Quantum median filter for Total Variation image denoising

Simone De Santis , Damiana Lazzaro , Riccardo Mengoni , Serena Morigi

分类：计算机视觉

2022-12-02

In this new computing paradigm, named quantum computing, researchers from all over the world are taking their first steps in designing quantum circuits for image processing, through a difficult process of knowledge transfer. This effort is named Quantum Image Processing, an emerging research field pushed by powerful parallel computing capabilities of quantum computers. This work goes in this direction and proposes the challenging development of a powerful method of image denoising, such as the Total Variation (TV) model, in a quantum environment. The proposed Quantum TV is described and its sub-components are analysed. Despite the natural limitations of the current capabilities of quantum devices, the experimental results show a competitive denoising performance compared to the classical variational TV counterpart.

translated by 谷歌翻译

Machine learning-accelerated chemistry modeling of protoplanetary disks

Grigorii V. Smirnov-Pinchukov , Tamara Molyarova , Dmitry A. Semenov , Vitaly V. Akimkin , Sierk van Terwisga , Riccardo Francheschi , Thomas Henning

分类：机器学习

2022-09-27

目标。借助（子）毫米观测值的大量分子发射数据和詹姆斯·韦伯（James Webb）空间望远镜红外光谱，访问原磁盘的化学成分的快进模型至关重要。方法。我们使用了热化学建模代码来生成各种多样的原行星磁盘模型。我们训练了一个最初的邻居（KNN）回归剂，以立即预测其他磁盘模型的化学反应。结果。我们表明，由于所采用的原行业磁盘模型中局部物理条件之间的相关性，可以仅使用一小部分物理条件来准确地重现化学反应。我们讨论此方法的不确定性和局限性。结论。所提出的方法可用于对线排放数据的贝叶斯拟合，以从观测值中检索磁盘属性。我们提出了在其他磁盘化学模型集上再现相同方法的管道。

translated by 谷歌翻译